Search CORE

138 research outputs found

On Anonymizing the Provenance of Collection-Based Workflows

Author: Belhajjame Khalid
Publication venue: HAL CCSD
Publication date: 07/01/2020
Field of study

We examine in this paper the problem of anonymizing the prove-nance of collection-oriented workflows, in which the constituent modules use and generate sets of data records. Despite their popularity , this kind of workflow has been overlooked in the literature w.r.t privacy. We, therefore, set out in this paper to examine the following questions: How the provenance of a collection-based module can be anonymized? Can lineage information be preserved? Beyond a single module, how can the provenance of a whole work-flow be anonymized? As well as addressing the above questions, we report on evaluation exercises that assess the effectiveness and efficiency of our solution. In particular, we tease apart the parameters that impact the quality of the obtained anonymized provenance information

Automatic vs Manual Provenance Abstractions: Mind the Gap

Author: Alper Pinar
Belhajjame Khalid
Goble Carole A.
Publication venue
Publication date: 21/05/2016
Field of study

In recent years the need to simplify or to hide sensitive information in provenance has given way to research on provenance abstraction. In the context of scientific workflows, existing research provides techniques to semi automatically create abstractions of a given workflow description, which is in turn used as filters over the workflow's provenance traces. An alternative approach that is commonly adopted by scientists is to build workflows with abstractions embedded into the workflow's design, such as using sub-workflows. This paper reports on the comparison of manual versus semi-automated approaches in a context where result abstractions are used to filter report-worthy results of computational scientific analyses. Specifically; we take a real-world workflow containing user-created design abstractions and compare these with abstractions created by ZOOM UserViews and Workflow Summaries systems. Our comparison shows that semi-automatic and manual approaches largely overlap from a process perspective, meanwhile, there is a dramatic mismatch in terms of data artefacts retained in an abstracted account of derivation. We discuss reasons and suggest future research directions.Comment: Preprint accepted to the 2016 workshop on the Theory and Applications of Provenance, TAPP 201

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Static Analysis of Taverna Workflows To Predict Provenance Patterns

Author: Alper Pinar
Belhajjame Khalid
Goble Carole
Publication venue: 'Elsevier BV'
Publication date: 01/10/2017
Field of study

The University of Manchester - Institutional Repository

SHARP: Harmonizing Galaxy and Taverna workflow provenance

Author: Belhajjame Khalid
Gaignard Alban
Skaf-Molli Hala
Publication venue: HAL CCSD
Publication date: 28/05/2017
Field of study

International audienceSHARP is a Linked Data approach for harmonizing cross-workflow provenance. In this demo, we demonstrate SHARP through a real-world omic experiment involving workflow traces generated by Taverna and Galaxy systems. SHARP starts by interlinking provenance traces generated by Galaxy and Taverna workflows and then harmonize the interlinked graphs thanks to OWL and PROV inference rules. The resulting provenance graph can be exploited for answering queries across Galaxy and Taverna workflow runs

Community Profiling for Crowdsourcing Queries

Author: Brambilla Marco
Daniela Grigori
Khalid Belhajjame
Mauri Andrea
Publication venue
Publication date: 01/01/2014
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

PAV ontology: provenance, authoring and versioning

Author: Belhajjame Khalid
Ciccarese Paolo
Clark Tim
Goble Carole
Gray Alasdair J. G.
Soiland-Reyes Stian
Publication venue
Publication date: 01/01/2013
Field of study

Provenance is a critical ingredient for establishing trust of published scientific content. This is true whether we are considering a data set, a computational workflow, a peer-reviewed publication or a simple scientific claim with supportive evidence. Existing vocabularies such as DC Terms and the W3C PROV-O are domain-independent and general-purpose and they allow and encourage for extensions to cover more specific needs. We identify the specific need for identifying or distinguishing between the various roles assumed by agents manipulating digital artifacts, such as author, contributor and curator. We present the Provenance, Authoring and Versioning ontology (PAV): a lightweight ontology for capturing just enough descriptions essential for tracking the provenance, authoring and versioning of web resources. We argue that such descriptions are essential for digital scientific content. PAV distinguishes between contributors, authors and curators of content and creators of representations in addition to the provenance of originating resources that have been accessed, transformed and consumed. We explore five projects (and communities) that have adopted PAV illustrating their usage through concrete examples. Moreover, we present mappings that show how PAV extends the PROV-O ontology to support broader interoperability. The authors strived to keep PAV lightweight and compact by including only those terms that have demonstrated to be pragmatically useful in existing applications, and by recommending terms from existing ontologies when plausible. We analyze and compare PAV with related approaches, namely Provenance Vocabulary, DC Terms and BIBFRAME. We identify similarities and analyze their differences with PAV, outlining strengths and weaknesses of our proposed model. We specify SKOS mappings that align PAV with DC Terms.Comment: 22 pages (incl 5 tables and 19 figures). Submitted to Journal of Biomedical Semantics 2013-04-26 (#1858276535979415). Revised article submitted 2013-08-30. Second revised article submitted 2013-10-06. Accepted 2013-10-07. Author proofs sent 2013-10-09 and 2013-10-16. Published 2013-11-22. Final version 2013-12-06. http://www.jbiomedsem.com/content/4/1/3

arXiv.org e-Print Archive

Heriot Watt Pure

Harvard University - DASH

ZENODO

Springer - Publisher Connector

PubMed Central

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Efficient Feedback Collection for Pay-as-you-go Source Selection

Author: Belhajjame Khalid
Cortés Ríos Julio César
Fernandes Alvaro
Paton Norman
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Article No. 1International audienceTechnical developments, such as the web of data and web data extraction, combined with policy developments such as those relating to open government or open science, are leading to the availability of increasing numbers of data sources. Indeed, given these physical sources, it is then also possible to create further virtual sources that integrate, aggregate or summarise the data from the original sources. As a result, there is a plethora of data sources, from which a small subset may be able to provide the information required to support a task. The number and rate of change in the available sources is likely to make manual source selection and curation by experts impractical for many applications, leading to the need to pursue a pay-as-you-go approach, in which crowds or data consumers annotate results based on their correctness or suitability, with the resulting annotations used to inform, e.g., source selection algorithms. However, for pay-as-you-go feedback collection to be cost-effective, it may be necessary to select judiciously the data items on which feedback is to be obtained. This paper describes OLBP (Ordering and Labelling By Precision), a heuristics-based approach to the targeting of data items for feedback to support mapping and source selection tasks, where users express their preferences in terms of the trade-off between precision and recall. The proposed approach is then evaluated on two different scenarios, mapping selection with synthetic data, and source selection with real data produced by web data extraction. The results demonstrate a significant reduction in the amount of feedback required to reach user-provided objectives when using OLBP

Base de publications de l'université Paris-Dauphine

The University of Manchester - Institutional Repository

Privacy-preserving data analysis workflows for eScience

Author: Barhamgi Mahmoud
Belhajjame Khalid
Burégio Vanilson
Faci Noura
Maamar Zakaria
Soares Edvan
Publication venue: ZU Scholars
Publication date: 01/01/2019
Field of study

©2019 Copyright held by the author(s). Computing-intensive experiences in modern sciences have become increasingly data-driven illustrating perfectly the Big-Data era’s challenges. These experiences are usually specified and enacted in the form of workflows that would need to manage (i.e., read, write, store, and retrieve) sensitive data like persons’ past diseases and treatments. While there is an active research body on how to protect sensitive data by, for instance, anonymizing datasets, there is a limited number of approaches that would assist scientists identifying the datasets, generated by the workflows, that need to be anonymized along with setting the anonymization degree that must be met. We present in this paper a preliminary for setting and inferring anonymization requirements of datasets used and generated by a workflow execution. The approach was implemented and showcased using a concrete example, and its efficiency assessed through validation exercises

ZU Scholars (Zayed University)

EDBT/ICDT Workshops - Privacy-Preserving Data Analysis Workflows for eScience.

Author: Barhamgi Mahmoud
Belhajjame Khalid
Burégio Vanilson
Faci Noura
Maamar Zakaria
Soares Edvan
Publication venue: ZU Scholars
Publication date: 01/01/2019
Field of study

ZU Scholars (Zayed University)

The Research Object Suite of Ontologies: Sharing and Exchanging Research Data and Methods on the Open Web

Author: Bechhofer Sean
Belhajjame Khalid
Corcho Óscar
Garijo Daniel
Goble Carole
Gómez-Pérez José-Manuel
Hettne Kristina
Klyne Graham
Palma Raul
Zhao Jun
Publication venue
Publication date: 03/02/2014
Field of study

Research in life sciences is increasingly being conducted in a digital and online environment. In particular, life scientists have been pioneers in embracing new computational tools to conduct their investigations. To support the sharing of digital objects produced during such research investigations, we have witnessed in the last few years the emergence of specialized repositories, e.g., DataVerse and FigShare. Such repositories provide users with the means to share and publish datasets that were used or generated in research investigations. While these repositories have proven their usefulness, interpreting and reusing evidence for most research results is a challenging task. Additional contextual descriptions are needed to understand how those results were generated and/or the circumstances under which they were concluded. Because of this, scientists are calling for models that go beyond the publication of datasets to systematically capture the life cycle of scientific investigations and provide a single entry point to access the information about the hypothesis investigated, the datasets used, the experiments carried out, the results of the experiments, the people involved in the research, etc. In this paper we present the Research Object (RO) suite of ontologies, which provide a structured container to encapsulate research data and methods along with essential metadata descriptions. Research Objects are portable units that enable the sharing, preservation, interpretation and reuse of research investigation results. The ontologies we present have been designed in the light of requirements that we gathered from life scientists. They have been built upon existing popular vocabularies to facilitate interoperability. Furthermore, we have developed tools to support the creation and sharing of Research Objects, thereby promoting and facilitating their adoption.Comment: 20 page

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository